224 research outputs found

    A new approach to hierarchical data analysis: Targeted maximum likelihood estimation for the causal effect of a cluster-level exposure

    Full text link
    We often seek to estimate the impact of an exposure naturally occurring or randomly assigned at the cluster-level. For example, the literature on neighborhood determinants of health continues to grow. Likewise, community randomized trials are applied to learn about real-world implementation, sustainability, and population effects of interventions with proven individual-level efficacy. In these settings, individual-level outcomes are correlated due to shared cluster-level factors, including the exposure, as well as social or biological interactions between individuals. To flexibly and efficiently estimate the effect of a cluster-level exposure, we present two targeted maximum likelihood estimators (TMLEs). The first TMLE is developed under a non-parametric causal model, which allows for arbitrary interactions between individuals within a cluster. These interactions include direct transmission of the outcome (i.e. contagion) and influence of one individual's covariates on another's outcome (i.e. covariate interference). The second TMLE is developed under a causal sub-model assuming the cluster-level and individual-specific covariates are sufficient to control for confounding. Simulations compare the alternative estimators and illustrate the potential gains from pairing individual-level risk factors and outcomes during estimation, while avoiding unwarranted assumptions. Our results suggest that estimation under the sub-model can result in bias and misleading inference in an observational setting. Incorporating working assumptions during estimation is more robust than assuming they hold in the underlying causal model. We illustrate our approach with an application to HIV prevention and treatment

    Statistical Learning of Origin-Specific Statically Optimal Individualized Treatment Rules

    Get PDF
    Consider a longitudinal observational or controlled study in which one collects chronological data over time on n randomly sampled subjects. The time-dependent process one observes on each randomly sampled subject contains time-dependent covariates, time-dependent treatment actions, and an outcome process or single final outcome of interest. A statically optimal individualized treatment rule (as introduced in van der Laan, Petersen & Joffe (2005), Petersen & van der Laan (2006)) is a (unknown) treatment rule which at any point in time conditions on a user-supplied subset of the past, computes the future static treatment regimen that maximizes a (conditional) mean future outcome of interest, and applies the first treatment action of the latter regimen. In particular, Petersen & van der Laan (2006) clarified that, in order to be statically optimal, an individualized treatment rule should not depend on the observed treatment mechanism. Petersen & van der Laan (2006) further developed estimators of statically optimal individualized treatment rules based on a past capturing all confounding of past treatment history on outcome. In practice, however, one typically wishes to find individualized treatment rules responding to a user-supplied subset of the complete observed history, which may not be sufficient to capture all confounding. The current article provides an important advance on Petersen & van der Laan (2006) by developing locally efficient double robust estimators of statically optimal individualized treatment rules responding to such a user-supplied subset of the past. However, failure to capture all confounding comes at a price; the static optimality of the resulting rules becomes origin-specific. We explain origin-specific static optimality, and discuss the practical importance of the proposed methodology. We further present the results of a data analysis in which we estimate a statically optimal rule for switching antiretroviral therapy among patients infected with resistant HIV virus

    History-Adjusted Marginal Structural Models and Statically-Optimal Dynamic Treatment Regimes

    Get PDF
    Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject\u27s treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. Particularly in the context of clinical decision making, such time-varying effect modifiers are often of considerable or even primary interest, as they are used in practice to guide treatment decisions for an individual. In this article we propose a generalization of marginal structural models, which we call history-adjusted marginal structural models (HA-MSM). These models allow estimation of adjusted causal effects of treatment, given the observed past, and are therefore more suitable for making treatment decisions at the individual level and for identification of time-dependent effect modifiers. Specifically, a HA-MSM models the conditional distribution of treatment-specific counterfactual outcomes, conditional on the whole or a subset of the observed past up till a time-point, simultaneously for all time-points. Double robust inverse probability of treatment weighted estimators have been developed and studied in detail for standard MSM. We extend these results by proposing a class of double robust inverse probability of treatment weighted estimators for the unknown parameters of the HA-MSM. In addition, we show that HA-MSM provide a natural approach to identifying the dynamic treatment regime which follows, at each time-point, the history-adjusted (up till the most recent time point) optimal static treatment regime. We illustrate our results using an example drawn from the treatment of HIV infection

    History-Adjusted Marginal Structural Models: Time-Varying Effect Modification

    Get PDF
    Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment, particularly in the context of longitudinal data structures. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. However, standard MSM cannot incorporate modification of treatment effects by time-varying covariates. In the context of clinical decision- making such time-varying effect modifiers are often of considerable interest, as they are used in practice to guide treatment decisions for an individual. In this article we introduce a generalization of marginal structural models, which we call history-adjusted marginal structural models (HA-MSM). These models allow estimation of adjusted causal effects of treatment, given the observed past, and are therefore more suitable for making treatment decisions at the individual level and for identification of time-dependent effect modifiers. We provide a practical introduction to HA-MSM relying on an example drawn from the treatment of HIV, and discuss parameters estimated, assumptions, and implementation using standard software

    Estimation of Direct Causal Effects

    Get PDF
    Many common problems in epidemiologic and clinical research involve estimating the effect of an exposure on an outcome while blocking the exposure\u27s effect on an intermediate variable. Effects of this kind are termed direct effects. Estimation of direct effects arises frequently in research aimed at understanding mechanistic pathways by which an exposure acts to cause or prevent disease, as well as in many other settings. Although multivariable regression is commonly used to estimate direct effects, this approach requires assumptions beyond those required for the estimation of total causal effects. In addition, multivariable regression estimates a particular type of direct effect, the effect of an exposure on outcome fixing the intermediate at a specified level. Using the counterfactual framework, we distinguish this definition of a direct effect (Type 1 direct effect) from an alternative definition, in which the effect of the exposure on the intermediate is blocked, but the intermediate is otherwise allowed to vary as it would in the absence of exposure (Type 2 direct effect). When the intermediate and exposure interact to affect the outcome these two types of direct effects address distinct research questions. Relying on examples, we illustrate the difference between Type 1 and Type 2 direct effects. We propose an estimation approach for Type 2 direct effects that can be implemented using standard statistical software and illustrate its implementation using a numerical example. We also review the assumptions underlying our approach, which are less restrictive than those proposed by previous authors

    Estimation of Direct and Indirect Causal Effects in Longitudinal Studies

    Get PDF
    The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders, Robins & Greenland (1992) and Pearl (2000), develop two identifiability results for direct and indirect causal effects. They define an individual direct effect as the counterfactual effect of a treatment on an outcome when the intermediate variable is set at the value it would have had if the individual had not been treated, and the population direct effect as the mean of these individual counterfactual direct effects. The identifiability result developed by Robins & Greenland (1992) relies on an additional ``No-Interaction Assumption\u27\u27, while the identifiability result developed by Pearl (2000) relies on a particular assumption about conditional independence in the population being sampled. Both assumptions are considered very restrictive. As a result, estimation of direct and indirect effects has been considered infeasible in many settings. We show that the identifiability result of Pearl (2000), also holds under a new conditional independence assumption which states that, within strata of baseline covariates, the individual direct effect at a fixed level of the intermediate variable is independent of the no-treatment counterfactual intermediate variable. We argue that our assumption is typically less restrictive than both the assumption of Pearl (2000), and the ``No-interaction Assumption\u27\u27 of Robins & Greenland (1992). We also generalize the current definition of the direct (and indirect) effect of a treatment as the population mean of individual counterfactual direct (and indirect) effects to 1) a general parameter of the population distribution of individual counterfactual direct (and indirect) effects, and 2) change of a general parameter of the population distribution of the appropriate counterfactual treatment-specific outcome. Subsequently, we generalize our identifiability result for the mean to identifiability results for these generally defined direct effects. We also discuss methods for modelling, testing, and estimation, and we illustrate our results throughout using an example drawn from the treatment of HIV infection

    Direct Effect Models

    Get PDF
    The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of a treatment on an outcome when the intermediate variable is set at the value it would have had if the individual had not been treated, and the population direct effect as the mean of these individual counterfactual direct effects. In this article we first generalize this definition of a direct effect. Given a user-supplied model for the population direct effect of treatment actions, possibly conditional on a user-supplied subset of the baseline co-variables, we propose inverse probability of treatment weighted estimators, likelihood-based estimators, and double robust inverse probability of treatment weighted estimators of the unknown parameters of this model. The inverse probability of treatment weighted estimator corresponds with a weighted regression and can thus be implemented with standard software

    History-Adjusted Marginal Structural Models: Optimal Treatment Strategies

    Get PDF
    Much of clinical medicine involves choosing a future treatment plan that is expected to optimize a patient\u27s long-term outcome, and modifying this treatment plan over time in response to changes in patient characteristics. However, dynamic treatment regimens, or decision rules for altering treatment in response to time-varying covariates, are rarely estimated based on observational data. In a companion paper, we introduced a generalization of Marginal Structural Models, named History-Adjusted Marginal Structural Models, that estimate modification of causal effects by time-varying covariates. Here, we illustrate how History-Adjusted Marginal Structural Models can be used to identify a specific type of optimal dynamic treatment regimen. Estimation and interpretation of this dynamic treatment regimen are illustrated using an example drawn from the treatment of HIV infection using antiretroviral drugs

    Computationally Efficient Confidence Intervals for Cross-validated Area Under the ROC Curve Estimates

    Get PDF
    In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a large amount of computation time. Thus, when the processes of obtaining a single estimate for cross-validated AUC is significant, the bootstrap, as a means of variance estimation, can be computationally intractable. As an alternative to the bootstrap, we demonstrate a computationally efficient influence curve based approach to obtaining a variance estimate for cross-validated AUC

    Adaptive Matching in Randomized Trials and Observational Studies

    Get PDF
    In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample communities from some target population, construct matched pairs of communities from those included in the sample based on some metric of similarity in baseline community characteristics, and then randomly allocate a treatment and a control intervention within each matched pair. In this case, the observed data can neither be represented as the realization of n independent random variables, nor, contrary to current practice, as the realization of n/2 independent random variables (treating the matched pair as the independent sampling unit). In this paper we study estimation of the average causal effect of a treatment under experimental designs in which treatment allocation potentially depends on the pre-intervention covariates of all units included in the sample. We define efficient targeted minimum loss based estimators for this general design, present a theorem that establishes the desired asymptotic normality of these estimators and allows for asymptotically valid statistical inference, and discuss implementation of these estimators. We further investigate the relative asymptotic efficiency of this design compared with a design in which unit-specific treatment assignment depends only on the units\u27 covariates. Our findings have practical implications for the optimal design and analysis of pair matched cluster randomized trials, as well as for observational studies in which treatment decisions may depend on characteristics of the entire sample
    • …
    corecore